Teaching a Neural Network to See Past Dithering

A Childhood in 16 Colors

There was a game called Per.Oxyd that I played obsessively as a kid, a puzzle game where you rolled a marble through elaborate levels, flipping tiles, avoiding traps. The game was brilliant. The textures were not.

Per.Oxyd shipped with a 16-color palette. To simulate gradients and smooth surfaces, the artists used Floyd-Steinberg dithering: scattering colored pixels in patterns that fool the eye at a distance but look unmistakably artificial up close. Every texture was a stippled mosaic of the same 16 colors arranged to approximate shades the hardware couldn't display.

I always wondered what those textures would look like in full color.

Architecture

The model is based on ESPCN (Efficient Sub-Pixel Convolutional Network), a lightweight architecture originally designed for super-resolution. I adapted it for de-dithering: same input resolution, same output resolution, but the network learns to map dithered pixel patterns back to their intended smooth colors.

Five convolutional layers, expanding from 3 input channels to 128, then back down to 64, finishing with a PixelShuffle layer for the final RGB output. LeakyReLU activations throughout, Sigmoid at the end.

Input (3ch) → Conv 128 → Conv 128 → Conv 128 → Conv 64 → Conv n×3 → PixelShuffle → Output (3ch)

De-dithering is a local operation. A dithered region encodes its intended color in the statistical distribution of pixels within a small neighborhood, five 3x3 convolutions give an 11x11 receptive field, which is plenty for Floyd-Steinberg patterns.

Training Data

No dataset of Per.Oxyd textures with full-color ground truth exists, so I built one. Take ~2,500 ordinary photographs (4.6 GB), dither them down to Per.Oxyd's exact 16-color palette using ImageMagick's Floyd-Steinberg implementation, and now you have perfect input/output pairs. The network would learn on modern photos and, hopefully, generalize to game textures it had never seen.

Each epoch sampled 500 random crops from this pool. The random cropping prevents the network from memorizing specific image layouts and forces it to learn the local dithering-to-color mapping.

Dithered game texture showing Per.Oxyd items in 16 colors — A Per.Oxyd texture dithered down to 16 colors, the input the network needs to reverse.

Checkerboards

First results were promising but soft. The network recovered smooth gradients from dithered regions, but the output had a watercolor quality: detail was being averaged away.

Then I tried it on actual game textures.

Per.Oxyd uses a distinctive checkerboard pattern for shadows: light squares alternating with dark squares, overlaid on the underlying texture. To a human eye, this reads as a transparent shadow. To my network, it was just another pattern to de-dither, and the model dutifully blended it into uniform grey.

Both checkerboard shadows and Floyd-Steinberg dithering are regular patterns of alternating pixel values. Both look like they encode a smooth underlying color. The model had no basis for distinguishing them.

The fix was a custom data augmentation step. During training, a percentage of input crops got a synthetic checkerboard pattern overlaid, but the ground truth remained the original undithered image with the shadow intact. This taught the network that checkerboard patterns should be preserved as spatial information, not averaged away.

Getting the augmentation parameters right took several iterations. Too much overlay and the network started preserving dithering patterns it should have been smoothing; too little and checkerboard shadows still got eaten. The sweet spot was around 15% of training crops with a 50% opacity checkerboard applied.

Before and after comparison showing dithered input vs de-dithered output — Before and after: the dithered input (left) vs. the network's reconstruction (right), after adding checkerboard augmentation.

Training

Training was not smooth. Steady improvement for the first 10 epochs, then a collapse between epochs 11 and 14: loss spiked, output quality degraded visibly. By epoch 15 it recovered and continued improving. I never fully diagnosed the collapse (possibly a learning rate issue, possibly the network reorganizing its internal representations as the augmentation data pushed it toward a more complex solution). The model recovered without intervention, which was a lesson in patience.

Best PSNR reached 30 dB, visually clean output with only minor artifacts in high-frequency regions.

The whole project consumed about 63 hours across multiple iterations, most of that on the checkerboard problem. Synthetic training data, dithering ordinary photos to create input/output pairs, worked better than I'd expected; the distribution mismatch between photos and game textures was smaller than it had any right to be. A VGG-based perceptual loss would probably sharpen the output, but five convolutional layers and a clever augmentation step already got those childhood textures into full color, which was the whole point.

De-dithered game items showing smooth color restoration — The same texture after de-dithering, smooth colors recovered from the 16-color input.